Short and simple. I've got a huge list of date-times like this as strings:
Jun 1 2005 1:33PM
Aug 28 1999 12:00AM
I'm going to be shoving these back into proper datetime fields in a database so I need to magic them into real datetime objects.
Any help (even if it's just a kick in the right direction) would be appreciated.
Edit: This is going through Django's ORM so I can't use SQL to do the conversion on insert.
转载于:https://stackoverflow.com/questions/466345/converting-string-into-datetime
datetime.strptime
is the main routine for parsing strings into datetimes. It can handle all sorts of formats, with the format determined by a format string you give it:
from datetime import datetime
datetime_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
The resulting datetime
object is timezone-naive.
Links:
Python documentation for strptime
/strftime
format strings: Python 2, Python 3
strftime.org is also a really nice reference for strftime
Notes:
strptime
= "string parse time"strftime
= "string format time"Check out strptime in the time module. It is the inverse of strftime.
$ python
>>> import time
>>> time.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
time.struct_time(tm_year=2005, tm_mon=6, tm_mday=1,
tm_hour=13, tm_min=33, tm_sec=0,
tm_wday=2, tm_yday=152, tm_isdst=-1)
Use the third party dateutil library:
from dateutil import parser
dt = parser.parse("Aug 28 1999 12:00AM")
It can handle most date formats, including the one you need to parse. It's more convenient than strptime as it can guess the correct format most of the time.
It very useful for writing tests, where readability is more important than performance.
You can install it with:
pip install python-dateutil
Something that isn't mentioned here and is useful: adding a suffix to the day. I decoupled the suffix logic so you can use it for any number you like, not just dates.
import time
def num_suffix(n):
'''
Returns the suffix for any given int
'''
suf = ('th','st', 'nd', 'rd')
n = abs(n) # wise guy
tens = int(str(n)[-2:])
units = n % 10
if tens > 10 and tens < 20:
return suf[0] # teens with 'th'
elif units <= 3:
return suf[units]
else:
return suf[0] # 'th'
def day_suffix(t):
'''
Returns the suffix of the given struct_time day
'''
return num_suffix(t.tm_mday)
# Examples
print num_suffix(123)
print num_suffix(3431)
print num_suffix(1234)
print ''
print day_suffix(time.strptime("1 Dec 00", "%d %b %y"))
print day_suffix(time.strptime("2 Nov 01", "%d %b %y"))
print day_suffix(time.strptime("3 Oct 02", "%d %b %y"))
print day_suffix(time.strptime("4 Sep 03", "%d %b %y"))
print day_suffix(time.strptime("13 Nov 90", "%d %b %y"))
print day_suffix(time.strptime("14 Oct 10", "%d %b %y"))
I have put together a project that can convert some really neat expressions. Check out timestring.
pip install timestring
>>> import timestring
>>> timestring.Date('monday, aug 15th 2015 at 8:40 pm')
<timestring.Date 2015-08-15 20:40:00 4491909392>
>>> timestring.Date('monday, aug 15th 2015 at 8:40 pm').date
datetime.datetime(2015, 8, 15, 20, 40)
>>> timestring.Range('next week')
<timestring.Range From 03/10/14 00:00:00 to 03/03/14 00:00:00 4496004880>
>>> (timestring.Range('next week').start.date, timestring.Range('next week').end.date)
(datetime.datetime(2014, 3, 10, 0, 0), datetime.datetime(2014, 3, 14, 0, 0))
Many timestamps have an implied timezone. To ensure that your code will work in every timezone, you should use UTC internally and attach a timezone each time a foreign object enters the system.
Python 3.2+:
>>> datetime.datetime.strptime(
... "March 5, 2014, 20:13:50", "%B %d, %Y, %H:%M:%S"
... ).replace(tzinfo=datetime.timezone(datetime.timedelta(hours=-3)))
Django Timezone aware datetime object example.
import datetime
from django.utils.timezone import get_current_timezone
tz = get_current_timezone()
format = '%b %d %Y %I:%M%p'
date_object = datetime.datetime.strptime('Jun 1 2005 1:33PM', format)
date_obj = tz.localize(date_object)
This conversion is very important for Django and Python when you have USE_TZ = True
:
RuntimeWarning: DateTimeField MyModel.created received a naive datetime (2016-03-04 00:00:00) while time zone support is active.
Remember this and you didn't need to get confused in datetime conversion again.
String to datetime object = strptime
datetime object to other formats = strftime
Jun 1 2005 1:33PM
is equals to
%b %d %Y %I:%M%p
%b Month as locale’s abbreviated name(Jun)
%d Day of the month as a zero-padded decimal number(1)
%Y Year with century as a decimal number(2015)
%I Hour (12-hour clock) as a zero-padded decimal number(01)
%M Minute as a zero-padded decimal number(33)
%p Locale’s equivalent of either AM or PM(PM)
so you need strptime i-e converting string
to
>>> dates = []
>>> dates.append('Jun 1 2005 1:33PM')
>>> dates.append('Aug 28 1999 12:00AM')
>>> from datetime import datetime
>>> for d in dates:
... date = datetime.strptime(d, '%b %d %Y %I:%M%p')
... print type(date)
... print date
...
Output
<type 'datetime.datetime'>
2005-06-01 13:33:00
<type 'datetime.datetime'>
1999-08-28 00:00:00
What if you have different format of dates you can use panda or dateutil.parse
>>> import dateutil
>>> dates = []
>>> dates.append('12 1 2017')
>>> dates.append('1 1 2017')
>>> dates.append('1 12 2017')
>>> dates.append('June 1 2017 1:30:00AM')
>>> [parser.parse(x) for x in dates]
OutPut
[datetime.datetime(2017, 12, 1, 0, 0), datetime.datetime(2017, 1, 1, 0, 0), datetime.datetime(2017, 1, 12, 0, 0), datetime.datetime(2017, 6, 1, 1, 30)]
You can use easy_date to make it easy:
import date_converter
converted_date = date_converter.string_to_datetime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
Here are two solutions using Pandas to convert dates formatted as strings into datetime.date objects.
import pandas as pd
dates = ['2015-12-25', '2015-12-26']
# 1) Use a list comprehension.
>>> [d.date() for d in pd.to_datetime(dates)]
[datetime.date(2015, 12, 25), datetime.date(2015, 12, 26)]
# 2) Convert the dates to a DatetimeIndex and extract the python dates.
>>> pd.DatetimeIndex(dates).date.tolist()
[datetime.date(2015, 12, 25), datetime.date(2015, 12, 26)]
Timings
dates = pd.DatetimeIndex(start='2000-1-1', end='2010-1-1', freq='d').date.tolist()
>>> %timeit [d.date() for d in pd.to_datetime(dates)]
# 100 loops, best of 3: 3.11 ms per loop
>>> %timeit pd.DatetimeIndex(dates).date.tolist()
# 100 loops, best of 3: 6.85 ms per loop
And here is how to convert the OP's original date-time examples:
datetimes = ['Jun 1 2005 1:33PM', 'Aug 28 1999 12:00AM']
>>> pd.to_datetime(datetimes).to_pydatetime().tolist()
[datetime.datetime(2005, 6, 1, 13, 33),
datetime.datetime(1999, 8, 28, 0, 0)]
There are many options for converting from the strings to Pandas Timestamps using to_datetime
, so check the docs if you need anything special.
Likewise, Timestamps have many properties and methods that can be accessed in addition to .date
In [34]: import datetime
In [35]: _now = datetime.datetime.now()
In [36]: _now
Out[36]: datetime.datetime(2016, 1, 19, 9, 47, 0, 432000)
In [37]: print _now
2016-01-19 09:47:00.432000
In [38]: _parsed = datetime.datetime.strptime(str(_now),"%Y-%m-%d %H:%M:%S.%f")
In [39]: _parsed
Out[39]: datetime.datetime(2016, 1, 19, 9, 47, 0, 432000)
In [40]: assert _now == _parsed
Create a small utility function like:
def date(datestr="", format="%Y-%m-%d"):
from datetime import datetime
if not datestr:
return datetime.today().date()
return datetime.strptime(datestr, format).date()
This is versatile enough:
The datetime Python module is good for getting date time and converting date time formats.
import datetime
new_date_format1 = datetime.datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
new_date_format2 = datetime.datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p').strftime('%Y/%m/%d %I:%M%p')
print new_date_format1
print new_date_format2
Output:
2005-06-01 13:33:00
2005/06/01 01:33PM
arrow offers many useful functions for dates and times. This bit of code provides an answer to the question and shows that arrow is also capable of formatting dates easily and displaying information for other locales.
>>> import arrow
>>> dateStrings = [ 'Jun 1 2005 1:33PM', 'Aug 28 1999 12:00AM' ]
>>> for dateString in dateStrings:
... dateString
... arrow.get(dateString.replace(' ',' '), 'MMM D YYYY H:mmA').datetime
... arrow.get(dateString.replace(' ',' '), 'MMM D YYYY H:mmA').format('ddd, Do MMM YYYY HH:mm')
... arrow.get(dateString.replace(' ',' '), 'MMM D YYYY H:mmA').humanize(locale='de')
...
'Jun 1 2005 1:33PM'
datetime.datetime(2005, 6, 1, 13, 33, tzinfo=tzutc())
'Wed, 1st Jun 2005 13:33'
'vor 11 Jahren'
'Aug 28 1999 12:00AM'
datetime.datetime(1999, 8, 28, 0, 0, tzinfo=tzutc())
'Sat, 28th Aug 1999 00:00'
'vor 17 Jahren'
See http://arrow.readthedocs.io/en/latest/ for more.
See my answer.
In real-world data this is a real problem: multiple, mismatched, incomplete, inconsistent and multilanguage/region date formats, often mixed freely in one dataset. It's not ok for production code to fail, let alone go exception-happy like a fox.
We need to try...catch multiple datetime formats fmt1,fmt2,...,fmtn and suppress/handle the exceptions (from strptime()
) for all those that mismatch (and in particular, avoid needing a yukky n-deep indented ladder of try..catch clauses). From my solution
def try_strptime(s, fmts=['%d-%b-%y','%m/%d/%Y']):
for fmt in fmts:
try:
return datetime.strptime(s, fmt)
except:
continue
return None # or reraise the ValueError if no format matched, if you prefer
If you want only date format then you can manually convert it by passing your individual fields like:
>>> import datetime
>>> date = datetime.date(int('2017'),int('12'),int('21'))
>>> date
datetime.date(2017, 12, 21)
>>> type(date)
<type 'datetime.date'>
You can pass your split string values to convert it into date type like:
selected_month_rec = '2017-09-01'
date_formate = datetime.date(int(selected_month_rec.split('-')[0]),int(selected_month_rec.split('-')[1]),int(selected_month_rec.split('-')[2]))
You will get the resulting value in date format.
I personally like the solution using the parser
module, which is the second Answer to this question and is beautiful, as you don't have to construct any string literals to get it working. However, one downside is that it is 90% slower than the accepted answer with strptime
.
from dateutil import parser
from datetime import datetime
import timeit
def dt():
dt = parser.parse("Jun 1 2005 1:33PM")
def strptime():
datetime_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
print(timeit.timeit(stmt=dt, number=10**5))
print(timeit.timeit(stmt=strptime, number=10**5))
>10.70296801342902
>1.3627995655316933
As long as you are not doing this a million times over and over again, I still think the parser
method is more convenient and will handle most of the time formats automatically.
It would do the helpful for converting string to datetime and also with time zone
def convert_string_to_time(date_string, timezone):
from datetime import datetime
import pytz
date_time_obj = datetime.strptime(date_string[:26], '%Y-%m-%d %H:%M:%S.%f')
date_time_obj_timezone = pytz.timezone(timezone).localize(date_time_obj)
return date_time_obj_timezone
date = '2018-08-14 13:09:24.543953+00:00'
TIME_ZONE = 'UTC'
date_time_obj_timezone = convert_string_to_time(date, TIME_ZONE)
for unix / mysql format 2018-10-15 20:59:29
from datetime import datetime
datetime_object = datetime.strptime('2018-10-15 20:59:29', '%Y-%m-%d %H:%M:%S')